Assessment of 18F-DCFPyL PSMA PET/CT and PET/MR quantitative parameters for reference standard organs: Inter-reader, inter-modality, and inter-patient variability

Prostate specific membrane antigen (PSMA)-based radiotracers have shown promise for prostate cancer assessment. Evaluation of quantitative variability and establishment of reference standards are important for optimal clinical and research utility. This work evaluates the variability of PSMA-based [18F]DCFPyL (PyL) PET quantitative reference standards. Consecutive eligible patients with biochemically recurrent prostate cancer were recruited for study participation from August 2016-October 2017. After PyL tracer injection, whole body PET/CT (wbPET/CT) was obtained with subsequent whole body PET/MR (wbPET/MR). Two readers independently created regions of interest (ROIs) including a 40% standardized uptake value (SUV) threshold ROI of the whole right parotid gland and separate spherical ROIs in the superior, mid, and inferior gland. Additional liver (right lobe) and blood pool spherical ROIs were defined. Bland-Altman analysis, including limits of agreement (LOA), as well as interquartile range (IQR) and coefficient of variance (CoV) was used. Twelve patients with prostate cancer were recruited (mean age, 61.8 yrs; range 54–72 years). One patient did not have wbPET/MR and was excluded. There was minimal inter-reader SUVmean variability (bias±LOA) for blood pool (-0.13±0.42; 0.01±0.41), liver (-0.55±0.82; -0.22±1.3), or whole parotid gland (-0.05±0.31; 0.08±0.24) for wbPET/CT and wbPET/MR, respectively. Greater inter-reader variability for the 1-cm parotid gland ROIs was present, for both wbPET/CT and wbPET/MR. Comparing wbPET/CT to the subsequently acquired wbPET/MR, blood pool had a slight decrease in SUVmean. The liver as well as parotid gland showed a slight increase in activity although the absolute bias only ranged from 0.45–1.28. The magnitude of inter-subject variability was higher for the parotid gland regardless of modality or reader. In conclusion, liver, blood pool, and whole parotid gland quantitation show promise as reliable reference normal organs for clinical/research PET applications. Variability with 1-cm parotid ROIs may limit its use.

Common or agreed upon reader interpretation guidelines for PSMA-based PET are needed. Prior publications, including the recently published E-PSMA standardized reporting guidelines [26], have included quantitative reference organs that might be used; however, evaluation of the variability and reproducibility of these reference standards has been more limited [26][27][28][29][30][31]. Therefore, the purpose of this work was to evaluate the variability of PSMA-based [ 18 F]DCFPyL (PyL) PET quantitative reference standards.

Materials and methods
This prospective observational study was approved by the University of Wisconsin-Madison Institutional Review Board and maintained full compliance with the Health Insurance Portability and Accountability Act.

Patients
From August 2016 -October 2017, consecutive eligible subjects were recruited for enrollment in the study. Subjects were eligible for inclusion if they (1) had a history of prostate cancer with prior radical prostatectomy, (2) had current evidence of biochemical recurrence with plan for salvage external-beam radiation therapy with or without androgen deprivation therapy, and (3) could undergo MRI. Patients were excluded if (1) they had a history of prior radiation therapy, chemotherapy, or androgen deprivation therapy for prostate cancer or (2) if they had a history of any other malignancy within the last 2 years, other than skin basal cell or cutaneous superficial squamous cell carcinoma that has not metastasized and superficial bladder cancer. Written informed consent was obtained from all participants.

Region of interest (ROI) generation
Two board-certified readers (4 and 6 years of experience in molecular imaging, E.M.L. and M. L. respectively) independently reviewed the PET/CT and PET/MR data and created volumetric ROIs using Mirada XD software (Oxford, UK). A 40% standardized uptake value (SUV) threshold method was used to define the whole right parotid gland. In addition, 1-cm spherical ROIs were placed in the superior, mid, and inferior parotid gland. Finally, a 3-cm spherical ROI was used to assess hepatic uptake (using the right hepatic lobe) and a 1-cm spherical ROI was used to assess blood pool (using the descending aorta). For the quantitative analysis included in the current study, prostate cancer related lesions were not considered. This choice was made in part because multiple subjects did not have confirmed sites of PSMA+ recurrent disease.

Statistical analysis
SUV mean was used for comparison. Bland-Altman plots, including calculation of bias and limits of agreement (LOA), were used and interquartile range (IQR) was assessed [32]. The coefficient of variance (CoV) was calculated by dividing the standard deviation by the population mean and multiplying the result by 100. The data was collated using Microsoft Excel (v. 2010, Microsoft, Redmond, WA) and additional statistical analysis, including Bland-Altman analysis, was performed using Matlab (MathWorks, Natick, MA).

Results
Twelve patients were recruited (mean age, 61.8 yrs; range 54-72 years). One patient did not complete the wbPET/MR and was therefore excluded.

Inter-reader variability
There was minimal inter-reader SUV mean variability (bias±LOA) for blood pool ( 2). Much of the inter-reader variability for the 1-cm parotid gland ROIs was likely due to spatial variability in parotid gland uptake (Fig 3).

Inter-subject variability
The magnitude of variability was higher for the parotid gland, compared to liver and blood pool, regardless of modality or reader. When assessed using a percentage coefficient of variance, the difference between the liver and parotid variability was less divergent, although the CoV for liver SUV mean ranged from 24-31% compared to 30-36%, 25-38%, 35-38%, and 33-45% for the whole, superior, mid, and inferior parotid gland measurements, respectively. Overall, blood pool had the lowest CoV, ranging from 15-21%. Inter-subject reference organ SUV mean variability are detailed in Table 2 and highlighted in Fig 5.

Discussion
This study sought to evaluate the variability of PSMA-based [ 18 F]DCFPyL (PyL) PET quantitative reference standards. We found that liver, blood pool, and whole parotid gland quantitation show promise as reliable reference organs. Greater variability with 1-cm parotid ROIs may limit its use. Liver ROIs had less intra-subject variability compared to parotid SUV mean which may be important to consider when establishing treatment or scoring cut-off values or thresholds.
Defining appropriate quantitative reference standards for PSMA-based PyL PET might allow for optimized interpretation of repeat studies and greater generalizability of research results. The PROMISE criteria proposed using a relative expression score with intermediate (score 2) activity defined as equal to or above liver but lower than parotid gland and high (score 3) activity defined as equal to or above parotid gland [33]. Qualitative evaluation using these reference organs was also included in the E-PSMA guidelines [26]. However, the use of quantitative reference organs as well as clinical evaluation based on the referenced guidelines may not be used regularly in clinical practice for many imaging centers. Still, the results of the current study certainly support the use of liver and blood pool uptake as quantitative parameters. Indeed, liver SUV mean had minimal inter-reader variability with an absolute mean bias between readers of only 0.55 and 0.22 for PET/CT and PET/MR, respectively. The use of liver uptake for visual quantification of disease activity/score has been previously established, most notably through the Lugano criteria for Lymphoma [34]. More recently a phase-II trial evaluating [177Lu]PSMA-617 in the setting of metastatic prostate cancer used lesion 68Ga-PSMA-11 uptake that was significantly greater than liver [35]. Another study looking at the change in 68 GA-PSMA in 43 patients treated systemically for metastatic castration resistant prostate cancer found a median change in SUV max of -13.3% (IQR: -44 to 41%) [36].
Use of parotid gland uptake as a reference quantitative parameter is more nuanced. First, in the current study evaluation of 1-cm ROIs showed relatively high inter-reader variability (absolute mean bias from 0.76-4.63 and limits of agreement 2.78-11.3). This relatively high variability is most likely due to heterogenous expression throughout the parotid gland and secondary to variable blood flow. SUV mean from a whole parotid ROI resulted in lower absolute mean bias (±limits of agreement) of 0.05 (±0.31) and 0.08 (±0.24) for reader 1 and 2, respectively. Therefore, if parotid uptake is used as a reference standard it might be prudent to use a whole gland ROI to minimize this variability. Similarly, if the parotid gland is utilized as a qualitative, or visual, reference standard [26] this may also minimize the effect of intra-gland heterogeneity.
Bland-Altman analysis and evaluation of repeatability is often best interpreted in relation to the clinical context under which it might be used. For example, Rowe et al. reported a median SUV max of 7.4 (IQR: 4.2-12.9) for suspected osseous metastatic lesions that were 'definitively' or 'equivocally' positive on [ 18 F]DCFPyL PET/CT [19]. Given the proximity of the average SUV mean of liver reported in our study, compared to the reported median of osseous metastases, this might serve as an appropriate standard to evaluate for sites of possible metastatic disease.
When comparing modalities, blood pool SUV mean was lower while liver and parotid gland were higher, although the absolute bias/increase was low (0.45-1.28) for the wbPET/MR compared to wbPET/CT. These differences were likely attributed to study design as the wbPET/ MR was acquired approximately 120 minutes after the [ 18 F]DCFPyL tracer injection whereas wbPET/CT was obtained 60 minutes after injection. This is supported by the work of Ferreira et al., which found a weak positive correlation between liver SUV peak and [ 18 F]DCFPyL uptake time [27]. A similar finding was seen in a study using [ 68 GA]PSMA [37]. Randomization of  acquisition order would be the ideal way to evaluate the effect of modality and scanner, not uptake time. In one study with [ 68 GA]PSMA, an average of 20% higher SUV max was calculated for PET/CT compared to same day PET/MR when the order was randomized [38]. Further work with randomization between timing of the PET/CT and PET/MR acquisition with the [ 18 F]DCFPyL tracer may also be useful. This study had important limitations. First, a single time point was utilized and thus testretest repeatability cannot be assessed. Second, as discussed previously, differences in acquisition timing and detector characteristics confound the evaluation of wbPET/MR compared to wbPET/CT. Third, other features that can affect variation in reference organ quantification, such as physiologic conditions and uptake time, were not directly assessed in this study and future research in these areas may be useful. Finally, the overall study size was small and confirmation of the results of the study in future larger trials may be beneficial.
In conclusion, liver, blood pool, and whole parotid gland quantitation show promise as reliable reference normal organs for clinical and research PET applications. Variability with 1-cm parotid ROIs may limit its use.